Exploratory Data Analysis of Greenhouse Gas Emissions in Canada¶

Introduction¶

Greenhouse gasses (GHG) in the atmosphere have been suggested to cause environmental threats such as global warming, sea level rise, and increased extreme weather events (Conference Board of Canada, 2013). As of 2020, the GHG emissions of Canada were the 11th highest of any country (ECCC, 2023). Although Canada makes up approximately 0.48% of the global population, its GHG emissions account for 1.5% of total emissions worldwide (United Nations, 2023; Statistics Canada, 2023; ECCC, 2023). As part of the Paris Agreement in 2015, the Government of Canada (GoC) has committed to reduce GHG emissions by 30% in relation to its 2005 levels, by 2030 (ECCC, 2023). More recently, a 2023 GoC initiative has stated a higher target of 40-45% GHG reductions from 2005 levels by 2030 (ECCC, 2023). As the sources of Canadian GHG emissions are geographically varied and are associated with a range of activities/economic sectors, the path to reducing emissions will involve a study of these sources in the context of GHG reduction opportunities. By comparatively visualizing provincial per-capita GHG emission, historical sector-specific Canadian GHG emissions, and the relation between Canadian Gross Domestic Product (GDP) and sector-specific GHG emissions, this project aims to contribute to such a study. The goal of this project is to provide a clear and informative overview of GHG emissions in Canada, with a focus on quantifying the relationship between GHG emissions and economic activity.

Data Set¶

The datasets in this project will be pulled from publicly accessible government websites. As per the terms and conditions of both the Natural Resources Canada website and the Canada Energy Regulator website; content can be reproduced and used as long as it is for non-commercial purposes, without any additional permission required. One of the datasets originates from the Canada Energy Regulator website; this showcases a map of Canada with a breakdown of energy emissions usage based on each province. The data will be pulled in a treemap format from the website and stored in a tabular format. Each column will indicate a year from 2010-2020 and each row will indicate an observation based on a single province/territory (C. E. R., 2023). The plan is to create two tabular datasets from this format; one of which will present Natural Gas usage based on each province and the second will present Crude Oil usage based on each province. The next two datasets will be pulled from the Natural Resources Canada website. Both these datasets will have columns from 2000-2020. The first dataset will present ‘Canada’s GHG Emissions by Sector, End Use and Subsector’. The observations in the first dataset will be split up into the following categories: Residential, Commercial/Institutional, Industrial, Passenger Transportation, Freight Transportation, Off-Road and Agriculture. The values of each observation is listed in the Megatonne unit of measure (Natural Resources Canada, 2023). Both datasets will be presented in a tabular format. The second dataset consists of ‘Commodity Prices and Background Indicators’. As with the data pulled from the Canada Energy Regulator website, this data will also support the main topic in addressing GHG Emissions within Canada. The rows are subdivided into ‘Commodity Prices and Total GDP’ consisting of each sector (Natural Resources Canada, 2023).
The final dataset encompasses the population of Canada based on each province. The dataset, also presented in tabular format, shows the breakdown of population on a quarterly basis from 2000 through 2020, and each row is separated based on province or territory. This is directly pulled from the Statistics Canada website (Statistics Canada, 2023). One important thing to note is that these population values will be estimates.

Guiding Questions¶

Collectively, the guiding questions of this project are designed to depict the spatial-temporal trends of GHG emissions in Canada. The purpose of these depictions are to highlight potential opportunities for reductions in GHG emissions; per-capita representations of spatial GHG emissions are likely to be more relevant than absolute abundances. Absolute levels of GHG emissions may over-emphasize areas that have high GHG emissions due simply to a larger population burden. Similarly, the GHG emissions of a specific economic sector may be best represented relative to the economic contribution of the same sector (measured as a proportion of GDP). In this way, modeling per-capita provincial GHG emissions and sector-specific GHG emissions relative to economic output may reveal areas wherein GHG emissions have been proportionally reduced and, more importantly, areas of potential improvement.

How have GHG emissions relative to economic output changed over the past two decades in the major sectors of Canada’s economy?

As part of this question, the size of an economic sector will be compared to its GHG emissions. Visualizing this relationship should provide foundational insight into the sources of GHG emissions. Further insights can then be drawn by visualizing the trend in GHG emissions per dollar value of an economic sector. In theory, a decrease in the proportion of GHG emissions per dollar could signify that the sector is becoming more efficient, or ‘green’. In contrast, sectors where GHG emissions are increasing per dollar may warrant further investigation as candidates for GHG emission reduction. Although the relationship between productivity and sector-specific GHG emissions will be informative, the price of commodities may have an important impact on GHG emissions that is not fully captured by the GHG per dollar ratio. In particular, it is known that higher oil and gas prices enable unconventional oil and gas production methods to become economically viable (Helbling, 2013). Different production methods are likely to have different GHG emission profiles, which may have a material impact on the overall GHG emissions of a sector.

How have the per-capita GHG emissions of Canadian Provinces and Territories changed over the past two decades?

Each region plays a unique role in contributing to Canada’s emission profile, with varying levels of economic activity, energy resources, and climate policies. Understanding how GHG emissions over the past two decades have evolved at the provincial and territorial level provides insights into which regions are making progress in reducing emissions, which ones are maintaining emissions, and which ones require more targeted efforts. Insights can be drawn by visualizing trends in per capita emissions for each province and territory. To provide a comprehensive analysis, the potential impacts of climate policies will be considered in each region by identifying key climate policies and assessing their timelines. Understanding the per capita emissions by region may help us evaluate the equity of emission reductions, in that the burden of reducing emissions is equitably distributed among regions, considering their economic and energy profiles.

Data Wrangling and Visualization for Guiding Question 1¶

As our first guiding question looked at the economic impact of GHG emissions, we began by analyzing GHG emissions by sector and the commodity prices which provided GDP values for Canada's largest sectors. The initial dataset consisting of subsectors had duplicate titles for similar GHG end uses (e.g. Space Heating is found in both residential and commercial properties). To account for this, we decided to rename the respective rows with a hyphen to indicate which sector it falls under. We removed the subheaders with totals for each sector to focus our data on the end uses and to create a separate table for column totals (Ghg1 table).

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#import both the GHG Emissions by Sector file and the Commodity Prices and Background Indicators file
Ghg = pd.read_excel("GHG Emissions by Sector.xlsx")
Ghg1 = pd.read_excel("GHG Emissions by Sector.xlsx")
CB = pd.read_excel("Commodity Prices and Background Indicators.xlsx")
CB1 = pd.read_excel("Commodity Prices and Background Indicators.xlsx")

#Ghg1 and CB1 will be used solely for the purpose of filtering and merging both of these dataframes
#The resulting dataframe is DF3

Ghg = Ghg.round(2) #Removed additional significant digits and limited it to 2 per value.
Ghg = Ghg.rename(columns={'Unnamed: 0': 'Source of GHG'}) #Added a title for column indicating each source
#Changed the names of values that had duplicate entries to better reflect the source of GHG.
Ghg.loc[2, 'Source of GHG'] = 'Space Heating - Residential'
Ghg.loc[3, 'Source of GHG'] = 'Water Heating - Residential'
Ghg.loc[7, 'Source of GHG'] = 'Lighting - Residential'
Ghg.loc[8, 'Source of GHG'] = 'Space Cooling - Residential'
Ghg.loc[10, 'Source of GHG'] = 'Space Heating - Commercial'
Ghg.loc[11, 'Source of GHG'] = 'Water Heating - Commercial'
Ghg.loc[14, 'Source of GHG'] = 'Lighting - Commercial'
Ghg.loc[15, 'Source of GHG'] = 'Space Cooling - Commercial'
Ghg.loc[31, 'Source of GHG'] = 'Light Trucks - Passenger'
Ghg.loc[34, 'Source of GHG'] = 'Air - Passenger'
Ghg.loc[35, 'Source of GHG'] = 'Rail - Passenger'
Ghg.loc[37, 'Source of GHG'] = 'Light Trucks - Freight'
Ghg.loc[40, 'Source of GHG'] = 'Air - Freight'
Ghg.loc[41, 'Source of GHG'] = 'Rail - Freight'
#removed subheadings which had total values of GHG per sector to better focus our dataset.
#Ghg.drop([0, 1, 9, 17, 28, 29, 36], axis=0, inplace=True)

The second dataset for our first guiding question entailed GDP generated by the biggest sectors in Canada on a yearly basis, namely Industrial, Commercial and Agriculture. The dataset also included GDP through electricity generation. For the purpose of our analysis, we focused on the 3 main sectors. Later on, we will be combining these two datasets (GHG Emissions by Sector and GDP by Sector).

In [25]:
#dropped initial row which showed total GHG Emissions to focus on each sector specifically.
CB.drop([0], axis=0, inplace=True)
CB = CB.round(2)
CB = CB.rename(columns={'Unnamed: 0': 'Source of GHG'})
CB
Out[25]:
Source of GHG 2000 2001 2002 2003 2004 2005 2006 2007 2008 ... 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
1 Industrial 407435.20 404893.90 414638.90 419498.50 433942.00 445641.20 449969.50 448495.90 437702.20 ... 437861.50 447856.50 459845.50 479792.90 473645.10 464860.20 486485.20 503277.60 501449.5 467445.0
2 Commercial/Institutional 823297.00 849552.00 882345.00 903023.00 932062.00 962288.00 1000647.00 1032522.00 1051936.00 ... 1105968.00 1126960.00 1152136.00 1179705.00 1196301.00 1218749.00 1252804.00 1289520.00 1323236.0 1273834.0
3 Agriculture 21276.05 21233.06 20139.70 22995.43 25620.29 25919.98 25881.79 24518.72 26710.91 ... 25804.04 26274.75 31980.63 28653.57 30322.48 32602.95 33461.52 33133.78 35179.4 37025.0
4 Electricity Generation 26218.92 25645.92 26979.08 27557.83 27696.92 29514.00 29096.17 30849.00 32367.00 ... 31475.92 30627.00 30962.00 31990.00 32397.00 33540.00 33989.00 34544.00 34824.0 34176.0

4 rows × 22 columns

The next task was to take our 3 main sectors - Industrial, Commercial, Agriculture and combine the amount of GHG emissions that were emitted by each sector and compare that to the total GDP of that given year. We will need the row of totals from the GDP dataset, as well as the sector totals for each year which we had previously removed from the Ghg dataframe.

In [26]:
#this will pull the rows consisting of totals for each sector from our Ghg dataset.
Ghg1 = Ghg1.iloc[[9, 17, 44]]
Ghg1 = Ghg1.rename(columns={'Unnamed: 0': 'Source of GHG'})
Ghg1.loc['Total'] = Ghg1.sum(numeric_only=True)
#we also created an additional row which sums the total GHG per year.
Ghg1.loc['Total', 'Source of GHG'] = 'Total GHG per Year'
Ghg1
Out[26]:
Source of GHG 2000 2001 2002 2003 2004 2005 2006 2007 2008 ... 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
9 Commercial/Institutional (Mt of CO2e) 55.105833 56.536416 59.064334 62.027890 59.021386 55.983739 52.163896 54.511811 53.391508 ... 50.091385 46.805387 47.566451 49.606041 49.399903 49.398433 52.325620 52.259235 54.232999 50.498562
17 Industrial (Mt of CO2e) 160.355409 158.726351 159.201689 165.741093 168.367031 164.390842 164.723550 174.256674 165.301070 ... 168.166005 168.377670 166.845965 167.363194 171.675708 166.879338 171.757890 179.198073 181.695153 165.606838
44 Agriculture (Mt of CO2e) 15.602440 15.001209 14.327296 14.981179 14.866488 15.129819 15.242616 15.702339 15.764766 ... 16.938284 16.471504 17.153562 17.757157 18.346355 18.554234 18.988716 19.568804 18.843595 17.999334
Total Total GHG per Year 231.063683 230.263976 232.593319 242.750162 242.254905 235.504400 232.130062 244.470823 234.457344 ... 235.195673 231.654562 231.565977 234.726392 239.421967 234.832005 243.072227 251.026112 254.771747 234.104735

4 rows × 22 columns

In [27]:
#For the purpose of our analysis we removed the electricity generation row and focused on Industrial, Commercial, Agriculture GDP values.
CB1.drop([0], axis=0, inplace=True)
CB1.drop([4], axis=0, inplace=True)
CB1 = CB1.rename(columns={'Unnamed: 0': 'GDP Generated by Sector'})
CB1.loc['Total'] = CB1.sum(numeric_only=True)
#we also created an additional row which sums the total GDP generated each year
CB1.loc['Total', 'GDP Generated by Sector'] = 'Total GDP Per Year'
CB1 = CB1.round(2)
CB1
Out[27]:
GDP Generated by Sector 2000 2001 2002 2003 2004 2005 2006 2007 2008 ... 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
1 Industrial 407435.20 404893.90 414638.9 419498.50 433942.00 445641.20 449969.50 448495.90 437702.20 ... 437861.50 447856.50 459845.50 479792.90 473645.10 464860.20 486485.20 503277.60 501449.5 467445.0
2 Commercial/Institutional 823297.00 849552.00 882345.0 903023.00 932062.00 962288.00 1000647.00 1032522.00 1051936.00 ... 1105968.00 1126960.00 1152136.00 1179705.00 1196301.00 1218749.00 1252804.00 1289520.00 1323236.0 1273834.0
3 Agriculture 21276.05 21233.06 20139.7 22995.43 25620.29 25919.98 25881.79 24518.72 26710.91 ... 25804.04 26274.75 31980.63 28653.57 30322.48 32602.95 33461.52 33133.78 35179.4 37025.0
Total Total GDP Per Year 1252008.25 1275678.96 1317123.6 1345516.93 1391624.29 1433849.18 1476498.29 1505536.62 1516349.11 ... 1569633.54 1601091.25 1643962.13 1688151.47 1700268.58 1716212.15 1772750.72 1825931.38 1859864.9 1778304.0

4 rows × 22 columns

Finally, we merged the two dataframes (one consisting of the total GDP generated by sector per year and the second dataset which shows the total GHG emitted by sector per year). Recall, we focused on three main sectors (Industrial, Commercial and Agriculture). Once merged, we proceeded to transpose the data to have each year show by row value as the index and each respective column indicate the sector. The last two columns showcase the total GHG per Year and total GDP per Year.

With both dataframes merged together, this makes it easier to compare and visualize our dataset to address our guiding question.

In [28]:
DF3 = Ghg1.merge(CB1, how='outer')
DF3 = DF3.transpose()
DF3 = DF3.drop(DF3.columns[[4, 5, 6]], axis=1)
DF3 = DF3.drop(labels=["GDP Generated by Sector"], axis=0, inplace=False)
DF3.columns = DF3.iloc[0]
DF3 = DF3[1:]
DF3
DF3.columns = ['Commercial/Institutional (Mt of CO2e)', 'Industrial (Mt of CO2e)','Agriculture (Mt of CO2e)','Total GHG per Year', 'Total GDP Per Year']
DF3
Out[28]:
Commercial/Institutional (Mt of CO2e) Industrial (Mt of CO2e) Agriculture (Mt of CO2e) Total GHG per Year Total GDP Per Year
2000 55.105833 160.355409 15.60244 231.063683 1252008.25
2001 56.536416 158.726351 15.001209 230.263976 1275678.96
2002 59.064334 159.201689 14.327296 232.593319 1317123.6
2003 62.02789 165.741093 14.981179 242.750162 1345516.93
2004 59.021386 168.367031 14.866488 242.254905 1391624.29
2005 55.983739 164.390842 15.129819 235.5044 1433849.18
2006 52.163896 164.72355 15.242616 232.130062 1476498.29
2007 54.511811 174.256674 15.702339 244.470823 1505536.62
2008 53.391508 165.30107 15.764766 234.457344 1516349.11
2009 50.680115 153.227149 13.307905 217.215169 1468674.79
2010 50.474459 162.094963 15.63274 228.202162 1520466.55
2011 50.091385 168.166005 16.938284 235.195673 1569633.54
2012 46.805387 168.37767 16.471504 231.654562 1601091.25
2013 47.566451 166.845965 17.153562 231.565977 1643962.13
2014 49.606041 167.363194 17.757157 234.726392 1688151.47
2015 49.399903 171.675708 18.346355 239.421967 1700268.58
2016 49.398433 166.879338 18.554234 234.832005 1716212.15
2017 52.32562 171.75789 18.988716 243.072227 1772750.72
2018 52.259235 179.198073 19.568804 251.026112 1825931.38
2019 54.232999 181.695153 18.843595 254.771747 1859864.9
2020 50.498562 165.606838 17.999334 234.104735 1778304.0

To take our data visualization skills further, we were able to take our merged dataframe (DF3) and create a visual dashboard. Through the use of hvplot in Pandas and the panel library in Python, we can seamlessly create a dashboard which users can toggle between different sectors and generate a quick line and bar plot showcasing GHG emissions by sector as well as total GHG and GDP generated on a yearly basis.

In [29]:
import pandas as pd
import numpy as np
import panel as pn
pn.extension("tabulator")

import hvplot.pandas

The following code below will generate an interactive dataframe using our DF3 dataframe. This will generate a table with the year column and one of the 5 columns from the DF3 dataframe along with a toggle to switch between the different columns.

In [30]:
DF3 = DF3.reset_index()

iDF3 = DF3.interactive()

yaxis_ghg = pn.widgets.RadioButtonGroup(
    name='Y axis',
    options=['Commercial/Institutional (Mt of CO2e)', 'Industrial (Mt of CO2e)', 'Agriculture (Mt of CO2e)','Total GHG per Year','Total GDP Per Year'],
    button_type='success'
)

ghg_pipeline = (
    iDF3
    .groupby(['index'])[yaxis_ghg].mean()
    .to_frame()
    .reset_index()
    .sort_values(by='index')
    .reset_index(drop=True)
)

The code below will use the interactive dataframe we created and generate a line plot and bar plot that can toggle between the various columns. The x-axis will contain the years from our first columns (2000-2020).

In [31]:
co2_plot = ghg_pipeline.hvplot(x = 'index', y=yaxis_ghg,line_width=4, title="GHG emissions by Sector & Totals")
ghg_bar_plot = ghg_pipeline.hvplot(kind='bar', x='index', y=yaxis_ghg, title='GHG Emissions by Usage and Total GDP')

Using the FastListTemplate from the Panel library, we are able to use both plots created and implement it into a dashboard.

In [32]:
template = pn.template.FastListTemplate(
    title='GHG Emissions across Canada',
    sidebar=[pn.pane.Markdown("# GHG Emissions and their impact on GDP")],
    main=[pn.Column(yaxis_ghg, co2_plot.panel(width=700), margin=(0,25)),
                 pn.Column(yaxis_ghg, ghg_bar_plot.panel(width=800))],
    accent_base_color="#eab676",
    header_background="#1e81b0",
)

template.show()
template.servable();
Launching server at http://localhost:57657
C:\Users\hurdg\anaconda3\envs\geo_env\Lib\site-packages\holoviews\core\data\pandas.py:39: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  return dataset.data.dtypes[idx].type
C:\Users\hurdg\anaconda3\envs\geo_env\Lib\site-packages\holoviews\core\data\pandas.py:39: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  return dataset.data.dtypes[idx].type
C:\Users\hurdg\anaconda3\envs\geo_env\Lib\site-packages\holoviews\core\data\pandas.py:39: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  return dataset.data.dtypes[idx].type
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: Column(id='f3098231-a6f5-43ad-bfe4-6f651a0fd721', ...)
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: Column(id='e190af2e-6838-47c2-b3aa-e1dd3401e0cf', ...)

We will transpose the first data table and call it , by this step we make it easy to use the choosen Points and to name it in propper way in the graphs

In [33]:
G=Ghg.transpose()
G.columns = G.iloc[0]
G=G.iloc[1:]
G = G.rename(columns={'Unnamed: 0': 'Year'})
G=G.round(2 )
G
Out[33]:
Source of GHG Total GHG Emissions Including Electricity (Mt of CO2e) Residential (Mt of CO2e) Space Heating - Residential Water Heating - Residential Appliances Major Appliances Other Appliances Lighting - Residential Space Cooling - Residential Commercial/Institutional (Mt of CO2e) ... Rail - Passenger Freight Transportation (Mt of CO2e) Light Trucks - Freight Medium Trucks Heavy Trucks Air - Freight Rail - Freight Marine Off-Road (Mt of CO2e) Agriculture (Mt of CO2e)
2000 465.71 74.53 46.18 13.84 10.46 7.8 2.66 3.28 0.78 55.11 ... 0.23 64.81 10.24 10.77 28.97 0.56 6.36 7.92 5.89 15.6
2001 463.93 74.14 43.3 14.32 11.49 8.43 3.06 3.66 1.39 56.54 ... 0.23 65.24 10.38 12.35 27.18 0.47 6.31 8.55 6.56 15.0
2002 469.76 75.76 45.46 14.22 10.95 7.88 3.07 3.49 1.63 59.06 ... 0.21 64.0 10.77 11.94 26.84 0.51 5.74 8.2 6.73 14.33
2003 487.54 78.59 47.5 14.58 11.47 8.11 3.36 3.73 1.31 62.03 ... 0.2 68.47 10.93 14.04 28.93 0.47 5.81 8.3 6.81 14.98
2004 491.29 77.22 47.2 14.39 11.1 7.7 3.39 3.57 0.96 59.02 ... 0.2 72.57 11.1 15.58 30.23 0.5 5.98 9.18 6.97 14.87
2005 485.46 75.06 45.34 14.23 10.38 7.1 3.27 3.28 1.84 55.98 ... 0.21 74.26 11.25 14.53 32.15 0.54 6.37 9.41 7.12 15.13
2006 476.56 71.5 42.57 14.06 10.27 6.91 3.36 3.18 1.41 52.16 ... 0.21 74.44 11.08 16.81 31.08 0.5 6.68 8.29 7.21 15.24
2007 502.37 78.11 48.39 14.5 10.54 6.97 3.58 3.19 1.48 54.51 ... 0.22 78.3 11.72 17.28 32.48 0.41 7.16 9.25 7.31 15.7
2008 488.5 75.67 47.83 13.84 9.98 6.45 3.52 3.0 1.02 53.39 ... 0.25 79.4 11.61 18.22 32.68 0.34 7.54 9.0 7.39 15.76
2009 465.27 71.51 45.72 13.03 9.23 5.85 3.38 2.76 0.77 50.68 ... 0.23 78.8 11.77 19.44 32.15 0.31 6.42 8.71 7.35 13.31
2010 479.88 69.64 42.7 13.03 9.53 5.86 3.67 2.85 1.53 50.47 ... 0.19 82.93 12.15 21.8 33.26 0.37 6.34 9.02 7.4 15.63
2011 486.93 69.93 44.44 13.17 8.49 5.13 3.36 2.48 1.35 50.09 ... 0.22 83.23 12.16 21.36 34.96 0.38 7.14 7.23 7.52 16.94
2012 479.66 64.79 40.23 12.65 8.17 4.82 3.35 2.33 1.41 46.81 ... 0.19 83.34 12.45 21.15 35.03 0.42 7.35 6.95 7.62 16.47
2013 484.53 66.42 42.93 12.36 7.98 4.62 3.35 2.21 0.94 47.57 ... 0.16 84.53 12.96 22.13 35.36 0.44 7.09 6.55 7.76 17.15
2014 484.33 66.36 43.75 11.77 7.86 4.48 3.38 2.14 0.84 49.61 ... 0.15 83.68 12.98 21.54 35.57 0.43 7.28 5.88 7.93 17.76
2015 487.94 66.3 42.75 12.12 8.14 4.58 3.56 2.17 1.12 49.4 ... 0.16 80.45 13.55 20.62 33.6 0.39 6.92 5.36 8.13 18.35
2016 477.75 61.05 39.03 11.39 7.39 4.12 3.27 1.93 1.32 49.4 ... 0.16 77.47 14.54 20.03 31.43 0.41 6.35 4.71 8.33 18.55
2017 493.83 63.06 41.09 11.87 7.26 4.02 3.25 1.87 0.97 52.33 ... 0.18 81.71 15.07 20.55 33.24 0.49 7.27 5.09 8.46 18.99
2018 511.12 65.04 43.35 12.17 6.67 3.67 2.99 1.67 1.19 52.26 ... 0.17 85.34 15.68 21.64 34.76 0.55 7.43 5.28 8.64 19.57
2019 513.54 61.5 41.18 11.26 6.63 3.59 3.04 1.63 0.79 54.23 ... 0.18 86.28 16.21 21.29 34.73 0.53 7.48 6.04 8.77 18.84
2020 454.5 57.1 36.79 11.4 6.32 3.41 2.9 1.55 1.05 50.5 ... 0.09 78.18 14.01 18.73 32.34 0.65 7.06 5.39 8.82 18.0

21 rows × 45 columns

Now we take the Total values of GHG emmissions for each section to compare the GDP progress with the GHG emmissons to find the relation between them

In [44]:
t=Ghg.iloc[[0, 1, 9, 17, 28, 29, 36]]

total=t.transpose()
total.columns = total.iloc[0]
total=total.iloc[1:]
total = total.rename(columns={'Unnamed: 0': 'Year'})

total=total.round(2 )
total.index = total.index.astype(str)
total
Out[44]:
Source of GHG Total GHG Emissions Including Electricity (Mt of CO2e) Residential (Mt of CO2e) Commercial/Institutional (Mt of CO2e) Industrial (Mt of CO2e) Total Transportation (Mt of CO2e) Passenger Transportation (Mt of CO2e) Freight Transportation (Mt of CO2e)
2000 465.71 74.53 55.11 160.36 160.11 89.41 64.81
2001 463.93 74.14 56.54 158.73 159.52 87.72 65.24
2002 469.76 75.76 59.06 159.2 161.41 90.68 64.0
2003 487.54 78.59 62.03 165.74 166.21 90.92 68.47
2004 491.29 77.22 59.02 168.37 171.82 92.28 72.57
2005 485.46 75.06 55.98 164.39 174.9 93.51 74.26
2006 476.56 71.5 52.16 164.72 172.93 91.28 74.44
2007 502.37 78.11 54.51 174.26 179.79 94.18 78.3
2008 488.5 75.67 53.39 165.3 178.37 91.57 79.4
2009 465.27 71.51 50.68 153.23 176.54 90.38 78.8
2010 479.88 69.64 50.47 162.09 182.04 91.71 82.93
2011 486.93 69.93 50.09 168.17 181.81 91.06 83.23
2012 479.66 64.79 46.81 168.38 183.22 92.26 83.34
2013 484.53 66.42 47.57 166.85 186.54 94.25 84.53
2014 484.33 66.36 49.61 167.36 183.24 91.63 83.68
2015 487.94 66.3 49.4 171.68 182.22 93.65 80.45
2016 477.75 61.05 49.4 166.88 181.87 96.07 77.47
2017 493.83 63.06 52.33 171.76 187.69 97.53 81.71
2018 511.12 65.04 52.26 179.2 195.05 101.06 85.34
2019 513.54 61.5 54.23 181.7 197.27 102.22 86.28
2020 454.5 57.1 50.5 165.61 163.29 76.29 78.18
In [45]:
from pydoc import locate
from ipywidgets import Label
plt.figure(figsize=(8, 6))
plt.plot(total.index,total['Residential (Mt of CO2e)'], marker='o',label='Residential ')
plt.plot(total.index,total['Commercial/Institutional (Mt of CO2e)'], marker='o',label='Commercial/Institutional')
plt.plot(total.index,total['Industrial (Mt of CO2e)'], marker='o',label='Industrial')
plt.plot(total.index,total['Passenger Transportation (Mt of CO2e)'], marker='o',label='Passenger Transportation')
plt.plot(total.index,total['Freight Transportation (Mt of CO2e)'], marker='o',label='Freight Transportation')
plt.xlabel('Year')
plt.ylabel('Emissions (Mt of CO2e)')
plt.title('Total GHG Emissions Over the Years')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
No description has been provided for this image

By taking A closer Look at the second data set to understand the relation Between GDP and GHG in Canada

In [54]:
BC=CB.transpose()
BC.columns = BC.iloc[0]
BC=BC.iloc[1:]
BC = BC.rename(columns={'Unnamed: 0': 'Year'})
BC=BC.round(2 )

plt.figure(figsize=(10, 6))

plt.plot(BC.index, BC['Industrial'], label='Industrial')
plt.plot(BC.index, BC['Commercial/Institutional'], label='Commercial/Institutional')
plt.plot(BC.index, BC['Agriculture      '], label='Agriculture')


plt.xlabel('Years')
plt.ylabel('Million')
plt.title('Total GDP Millions Over Years')
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image

By looking at the last two paragraphs whe can see that the Commercial domain is making a big fast progress in the income for the 20 years of our study and the industrial and the agriculture is having almost a stedy line but in the other side with all the fast progress the comercial domain is having the losest numbers of the other sections will the industrial department is dominating all other sections with the higest amouts of gas emmissions and that give us a big note to controle this emmissions

ofcource we are need to find the biggest factor for CO2 in each sector The first Data Visualising is the Freight Transportation we Choose the Columns from from G

In [55]:
Freight_Transportation=G[[ 'Light Trucks - Freight',' Medium Trucks', ' Heavy Trucks',
       'Air - Freight', 'Rail - Freight', ' Marine']]
ft=Freight_Transportation.transpose()
c=ft.sum(axis=1)
plt.figure(figsize=(7, 7))
plt.pie(c,labels=c.index,wedgeprops=dict(width=0.4),autopct='%1.1f%%')
plt.title("Freight Transportation Total by Category")
Out[55]:
Text(0.5, 1.0, 'Freight Transportation Total by Category')
No description has been provided for this image

we found that the Heavy Trucks is causing the highest amount of CO2 and then Medium Truck Now to create the Secont Graph we did the same steps but we choose the Pie chart while the first one was donat chart

In [56]:
Passenger_Transportation=G[[ 'Cars',
       'Light Trucks - Passenger', 'Buses', 'Air - Passenger',
       'Rail - Passenger']]
PT=Passenger_Transportation.transpose()
pt=PT.sum(axis=1)
plt.figure(figsize=(8, 8))
plt.pie(pt,autopct='%1.1f%%')
plt.title("Passenger Transportation Total by Category")
plt.legend(pt.index,loc='upper right')
Out[56]:
<matplotlib.legend.Legend at 0x25535568750>
No description has been provided for this image

from the chart above we can see that from the Total Passenger transportation the Cars is causing 42.5% of All ,this resault should encourage us to take a step to reduce this amount of gas emmissions as each one of us is responsable in this section to check on the Industry Gas Emmissions we sellected all needed columns from G and used Bar plot to view the Result

In [57]:
Industrial=G[[ 'Mining , Quarrying, and Oil and Gas Extraction', 'Pulp and Paper',
       'Iron and Steel', 'Smelting and Refining', 'Cement', 'Chemicals',
       'Petroleum Refining', 'Other Manufacturing', 'Forestry ',
       'Construction']]
I=Industrial.transpose()
i=I.sum(axis=1)
i.plot(kind='bar', stacked=True,color=(['#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a']))
plt.title("industrial Total by Category")
plt.ylabel('Emissions (Mt of CO2e)')
Out[57]:
Text(0, 0.5, 'Emissions (Mt of CO2e)')
No description has been provided for this image

we can see that the Mining ,and Gas Extraction is waving a red flag for the invernement with aount of CO2 ,Now to make a closer look at Residential and Commercial Secions we used Seaborn tho plot the next graphs

In [58]:
Residential=G[['Space Heating - Residential','Space Cooling - Residential','Water Heating - Residential','Lighting - Residential']
              ]
g = sns.catplot(
    data=Residential,
   kind="bar",
      alpha=.6, height=6)
g.set_xticklabels(['Space Heating ','Space Cooling ',
       'Water Heating ', 'Lighting '])

plt.ylabel('Emissions (Mt of CO2e)')
g.set_axis_labels("Sub Category", "Emissions (Mt of CO2e)")
plt.title("Residential Total by Category")
Out[58]:
Text(0.5, 1.0, 'Residential Total by Category')
No description has been provided for this image
In [59]:
Commercial=G[[ 'Space Heating - Commercial', 'Water Heating - Commercial',
       'Auxiliary Equipment', 'Auxiliary Motors', 'Lighting - Commercial',
       'Space Cooling - Commercial', 'Street Lighting']]
sns.set_theme(style="ticks", palette="pastel")
k=sns.boxplot(
             palette=["m", "g","y","b","r"],
            data=Commercial)
k.set_xticklabels(['S-Heat', 'W-Heat ',
       'Equipment', 'Motors', 'Lighting ',
       'Cooling ', 'St-Lighting'])
sns.despine(offset=10, trim=True)
C:\Users\hurdg\AppData\Local\Temp\ipykernel_15172\2750093320.py:5: UserWarning: 
The palette list has fewer values (5) than needed (7) and will cycle, which may produce an uninterpretable plot.
  k=sns.boxplot(
C:\Users\hurdg\AppData\Local\Temp\ipykernel_15172\2750093320.py:8: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
  k.set_xticklabels(['S-Heat', 'W-Heat ',
No description has been provided for this image

It is not surpricing that the Place heating is producing the Highest numbers Gas Emmisions and we need to be responsable in our houses and work places to limit the emmssions as much as we can

Data Wrangling and Visualization for Guiding Question 2¶

To answer our second guiding question, we first aimed to identify the regions making the most significant contributions to total GHG emissions. We approached this by creating line plots to visualize the emission trends from 2000 to 2021. To do this, we defined a list of regions based on the last characters of our Excel files and used a for loop, along with an f string, to read these files. We filtered each file by a specific column and row using the loc function, transposed the data frames, and combined them. We then customized the column names and generated plots using Plotly.

In [60]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go


# Define a list of Provinces and Territories
regions = ["AB", "BC", "MB","NB","NL","NS","NT","NU","ON","PE","QC","SK","YT"]

# Initialize an empty DataFrame to store the combined data
combined_data = pd.DataFrame()

# Loop through each Province and Territory excel file and select the total GHG column
for region in regions:
    # Define the file name based on the region code from above
    file_name = f"EN_GHG_IPCC_{region}.xlsx"
    # Read the Excel file, and select the Summary sheet
    df = pd.read_excel(file_name, sheet_name='Summary')
  # Select rows and columns containing total GHG
    df = df.loc[[3, 5], 'Unnamed: 15':]
    # Rename rows
    df = df.rename(index={3: 'Year', 5: f'ghg_{region}'})
        # Transpose the DataFrame
    df = df.transpose()
        # Combine data for this region with the combined_data DataFrame
    if combined_data.empty:
        combined_data = df
    else:
        combined_data = pd.merge(combined_data, df, on="Year", how="outer")


# Define a list of appropriate column names
column_names = ['Year','Alberta',"British Columbia","Manitoba","New Brunswick","Newfoundland & Labrador","Nova Scotia","Northwest Territories","Nunavut","Ontario", "PE Island","Quebec","Saskatchewan","Yukon"]


# Set the column names of the DataFrame
combined_data.columns = column_names



# Create a line plot
fig = px.line(
    combined_data,
    x="Year",
    y=combined_data.columns[1:],
    color_discrete_sequence=px.colors.qualitative.Light24,
    title="Figure 1b. Total Greenhouse Gas Emissions by Province and Territory in Canada (2000-2021)")


fig.update_yaxes(title_text="GHG Emissions (kt CO_2 eq)")
fig.update_yaxes(dtick=20000)
fig.update_layout(
    width=850, #Adjust the width
    height=750# Adjust the height
)
#Removing legend title
fig.update_layout(
    legend_title_text="")




fig.update_layout(
    xaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
    yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
    plot_bgcolor='white',  # Set background color
    legend=dict(orientation='h', yanchor='bottom', y=-0.2, xanchor='right', x=1)
)
#Allows for zooming into axis
fig.update_xaxes(automargin=True)
fig.update_yaxes(automargin=True)

fig.show()

For a different perspective on major contributors, we also created a pie chart based on the filtered data for the year 2019.

In [62]:
#extracting ghg values from dataframe combined_data
values = combined_data.loc[19, 'Alberta':'Yukon']

#Pie chart
fig = px.pie(values=values, names=values.index)

# Customize the layout
fig.update_layout(
    title='Figure 2b. Pie Chart of GHG Emissions for the Provinces and Territories of Canada ',

)

# Show the chart
fig.show()

GHG Per capita figure generation. In a separate section of code, we processed a population CSV file. After removing missing values and resetting the index, we filtered the data to retain only Q1 data for each year. We used regular expressions to extract the four-year digit and create a new column called "Year." Similarly, we did this for the two-character quarter data and placed it in another new column. We then filtered the data to include only rows corresponding to Q1. Following this, we sorted and reordered the columns to achieve a specific structure. From this refined DataFrame, named df_pop, we further filtered the data to obtain the 2019 population statistics. We followed a similar process for GHG data for the year 2019. Finally, we divided the GHG column by the population column and created a bar graph using Matplotlib. This bar plot visualizes the relationship between per capita GHG emissions for each province and territory for the year 2019 to facilitate comparative analysis.

In [63]:
import pandas as pd

import matplotlib.pyplot as plt

#Reading csv file
df_pop = pd.read_csv('Population.csv', encoding='latin-1').dropna(axis=0)

#Make row 0 as header
df_pop.columns = df_pop.iloc[0]

# Reset the index to exclude the first row
df_pop = df_pop[1:].reset_index(drop=True)

df_pop['Year'] = df_pop['Geography'].str.extract(r'(\d{4})')[0]
df_pop['Quarter'] = df_pop['Geography'].str.extract(r'(Q\d)')[0]

# Filter to show only Q1 for each year
df_pop = df_pop[df_pop['Quarter'] == 'Q1']

# Drop the 'Quarter' column if you no longer need it
df_pop = df_pop.drop(columns=['Quarter'])

#rest index
df_pop = df_pop[1:].reset_index(drop=True)

#sort columns by alphabebetical order
df_pop = df_pop[sorted(df_pop.columns)]
df_pop = df_pop[['Geography', 'Year', 'Canada'] + [col for col in df_pop.columns if col not in ['Geography', 'Year', 'Canada']]]


#For population data
# Extract the columns
columns_to_sort = df_pop.loc[:, 'Alberta':'Yukon']

# Sort the columns alphabetically
sorted_columns = sorted(columns_to_sort.columns)

# Reassign the sorted columns to the DataFrame
df_pops = df_pop[sorted_columns]
#Renaming the columns to match ghg columns
column_names_pop = ["Alberta", "British Columbia", "Manitoba", "New Brunswick", "Newfoundland & Labrador","Northwest Territories", "Nova Scotia", "Nunavut", "Ontario", "PE Island", "Quebec", "Saskatchewan", "Yukon"]


# Set the column names of the DataFrame
df_pops.columns = column_names_pop
row_2019_pop = df_pops.loc[20, :]


#For GHG data
# Extracting the columns
column_sorting_ghg = combined_data.loc[:, 'Alberta':'Yukon']
#Sorting the column alphabetically
sorted_columns_ghg =sorted(column_sorting_ghg.columns)


# Set the column names of the DataFrame
new_combined_data = combined_data[sorted_columns_ghg]
row_2019_ghg = new_combined_data.loc[20, :]



#Making sure data types are addressed
row_2019_pop = row_2019_pop.str.replace(',', '').astype(float)
row_2019_ghg = pd.to_numeric(row_2019_ghg, errors='coerce')

#Calculating per capita
per_capita_ghg = row_2019_ghg/row_2019_pop


#print(per_capita_ghg)
# Create a Pandas Series
per_capita_ghg = pd.Series(per_capita_ghg)



# Labeling the plot
plt.title('GHG Emissions Per Capita')
plt.xlabel('Provinces and Territories')
plt.ylabel('Emissions Per Capita')

per_capita_ghg = per_capita_ghg.sort_values(ascending=False)

# Plot a horizontal bar graph
per_capita_ghg.plot(kind='barh')

# Customize the plot
plt.title('Figure 3b. GHG Emissions Per Capita (Sorted)(2019)')
plt.xlabel('Emissions Per Capita')
plt.ylabel('Provinces and Territories')

# Show the plot
plt.show()
No description has been provided for this image

In this portion of the code the same was done as before with regards to reading the multiple excel files. Again, to do this, we defined a list of regions based on the last characters of our Excel files and used a for loop, along with an f string, to read these files. We filtered each file by a specific column and row using the loc function, transposed the data frames, and combined them. Here we wanted to filter GHG emissions coming from energy generation so we could use that value as the size of our bubbles in the scatter plot.

In [64]:
import pandas as pd

# Define a list of regions (e.g., Alberta, British Columbia, etc.)
regions = ["AB", "BC", "MB", "NB", "NL", "NT", "NS", "NU", "ON", "PE", "QC", "SK", "YT"]

# Initialize an empty DataFrame to store the combined data
combined_data = pd.DataFrame()


# Loop through each region
for region in regions:
    # Define the file name based on the region
    file_name = f"EN_GHG_IPCC_{region}.xlsx"

    # Check if the file exists

        # Read the Excel file
    df = pd.read_excel(file_name, sheet_name='Summary')


        # Select rows and columns containing the relevant data
    df = df.loc[[3, 6], 'Unnamed: 15':]

        # Rename rows
    df = df.rename(index={3: 'Year', 6: f'ghg_{region}'})
        # Transpose the DataFrame
    df = df.transpose()


        # Combine data for this region with the combined_data DataFrame
    if combined_data.empty:
        combined_data = df

    else:
        combined_data = pd.merge(combined_data, df, on="Year", how="outer")


# Extract row index 0 for energy sector ghg for the year 2000
energy_sec_2000 = combined_data.iloc[0]
energy_sector_2000 = energy_sec_2000[1:].tolist()
energy_sector_2000[7]=503
energy_sector_2000[5]=1487

# Extract row index 10 for energy sector ghg for the year 2010
row_10 = combined_data.iloc[10]
energy_sector_2010 = row_10[1:].tolist()

# Extract row index 19 for energy sector ghg for the year 2019
energy_sec_2019 = combined_data.iloc[19]
energy_sector_2019 = energy_sec_2019[1:].tolist()

This code segment is responsible for creating three separate dataframes for the years 2000, 2010, and 2019, each containing information about various Canadian provinces and territories. The data includes key attributes such as province codes, total greenhouse gas GHG emissions, population, and a "Bubble_Size" parameter related to the energy sectors GHG contributions. These dataframes are structured to enable a comparative analysis across the years.

To distinguish these dataframes by year, a 'Year' column is added to each, specifying the respective year (2000, 2010, or 2019). Afterward, these dataframes are combined into one consolidated dataframe named 'combined_df' using the 'pd.concat' function. The data is then utilized to create a scatter plot using Plotly Express. This scatter plot visualizes the relationship between population, total GHG emissions, and the bubble size(GHG emissions from energy generation) while distinguishing data points by the year to facilitate comparative analysis.

In [65]:
import pandas as pd
import plotly.express as px



# Extract row index 10 to get total GHG emission data for year 2000
year_2000 = combined_data.iloc[0]
total_ghg_2000 = year_2000[1:].tolist()
total_ghg_2000[7]=528
total_ghg_2000[5]=1530

# Extract row index 10 to get total GHG emission data for year 2010
year_2010 = combined_data.iloc[10]
total_ghg_2010 = year_2010[1:].tolist()

# Extract row index 19 to get total GHG emission data for the year 2019
year_2019 = combined_data.iloc[19]
total_ghg_2019 = year_2019[1:].tolist()




#2000 dataframe
pop_2000 = df_pop.loc[0, 'Alberta':'Yukon'].tolist()

data = {
    'Province': ["AB", "BC", "MB", "NB", "NL", "NT", "NS", "NU", "ON", "PE", "QC", "SK", "YT"],
    'Total_GHG_Emissions': total_ghg_2000,
    'Population': pop_2000,
    'Bubble_Size': energy_sector_2000
}

#2010 dataframe
df_2000 = pd.DataFrame(data)
pop_2010 = df_pop.loc[9, 'Alberta':'Yukon'].tolist()
data = {
    'Province': ["AB", "BC", "MB", "NB", "NL", "NT", "NS", "NU", "ON", "PE", "QC", "SK", "YT"],
    'Total_GHG_Emissions': total_ghg_2010,
    'Population': pop_2010,
    'Bubble_Size': energy_sector_2010
}

df_2010 = pd.DataFrame(data)

#2019 dataframe
pop_2019 = df_pop.loc[19, 'Alberta':'Yukon'].tolist()

data = {
    'Province': ["AB", "BC", "MB", "NB", "NL", "NT", "NS", "NU", "ON", "PE", "QC", "SK", "YT"],
    'Total_GHG_Emissions': total_ghg_2019,
    'Population': pop_2019,
    'Bubble_Size': energy_sector_2019
}

df_2019 = pd.DataFrame(data)


# Add a column to distinguish the data frames

df_2010['Year'] = '2010'
df_2000['Year'] = '2000'
df_2019['Year'] = '2019'

# Combine the two data frames
combined_df = pd.concat([df_2019, df_2010,df_2000], ignore_index=True)

# Create a scatter plot using Plotly Express
fig = px.scatter(
    combined_df,
    x='Population',
    y='Total_GHG_Emissions',
    size='Bubble_Size',
    color='Year',
    text ='Province',
    hover_name='Province',
    size_max=50

)


fig.update_layout(
    xaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
    yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
    plot_bgcolor='white',  #

)



# Set axis titles
fig.update_xaxes(title_text='Population')
fig.update_xaxes(type='linear')
fig.update_yaxes(title_text='Total GHG Emissions (kt CO_2 eq)')

# Set the title
fig.update_layout(title='Figure 4b. Comparison of GHG Emissions and Population by Province/Territory (2000 vs. 2010 vs. 2019)')

# Enable zoom and pan options
fig.update_xaxes(automargin=True)
fig.update_yaxes(automargin=True)


# Show the plot
fig.show()

To further analyze per capita GHG emissions, the below code builds upon the above analyses with the goal of visualizing annual provincial/territorial per capita GHG emissions for all study years. The dataframes containing the total GHG emissions and the population data are re-frmatted and combined with a shapefile that contains provincial/territorial geometries. The resulting geo-dataframe is visualized through an interactive choropleth within the plotly environment.

In [67]:
#### The below code re-runs the intial data import and manipulation code to re-create the original GHG dataframes ##
# Define a list of Provinces and Territories 
regions = ["AB", "BC", "MB","NB","NL","NS","NT","NU","ON","PE","QC","SK","YT"]  

# Initialize an empty DataFrame to store the combined data
combined_data_copy = pd.DataFrame()

# Loop through each Province and Territory excel file and select the total GHG column 
for region in regions:
    # Define the file name based on the region code from above
    file_name = f"EN_GHG_IPCC_{region}.xlsx"    
    # Read the Excel file, and select the Summary sheet 
    df = pd.read_excel(file_name, sheet_name='Summary')
  # Select rows and columns containing total GHG 
    df = df.loc[[3, 5], 'Unnamed: 15':]
    # Rename rows
    df = df.rename(index={3: 'Year', 5: f'ghg_{region}'})
        # Transpose the DataFrame
    df = df.transpose()
        # Combine data for this region with the combined_data DataFrame
    if combined_data_copy.empty:
        combined_data_copy = df
    else:
        combined_data_copy = pd.merge(combined_data_copy, df, on="Year", how="outer")
        

# Define a list of appropriate column names
column_names = ['Year','Alberta',"British Columbia","Manitoba","New Brunswick","Newfoundland & Labrador","Nova Scotia","Northwest Territories","Nunavut","Ontario", "PE Island","Quebec","Saskatchewan","Yukon"]
    

# Set the column names of the DataFrame
combined_data_copy.columns = column_names

#Ammend the original population dataframes
#Reformat df_pops dataframe to incclude year - specify new dataframe name "df_pop_wUear"
sorted_columns.append('Year')
df_pops_wYear = df_pop[sorted_columns]
column_names_pop_wYear = ["Alberta", "British Columbia", "Manitoba", "New Brunswick", "Newfoundland & Labrador","Northwest Territories", "Nova Scotia", "Nunavut", "Ontario", "PE Island", "Quebec", "Saskatchewan", "Yukon", "Year"]
df_pops_wYear.columns = column_names_pop_wYear
In [68]:
#Melt regional GHG dataframe into 'long' format - required for mapping function
regionGHG_long = pd.melt(combined_data_copy, id_vars=['Year'], var_name='Province')
regionGHG_long.rename(columns={'value': 'GHG'}, inplace=True)
regionGHG_long['GHG'].fillna(0, inplace=True)
regionGHG_long[['Year','GHG']] = regionGHG_long[['Year','GHG']].astype(int)
regionGHG_long['Province'] = regionGHG_long['Province'].astype(str)

#Similarly melt the population dataframe
regionPop_long = pd.melt(df_pops_wYear, id_vars=['Year'], var_name='Province')
regionPop_long.rename(columns={'value': 'Population'}, inplace=True)
regionPop_long.replace(',','', regex=True, inplace=True)
regionPop_long[['Year','Population']] = regionPop_long[['Year','Population']].astype(int)
regionPop_long['Province'] = regionPop_long['Province'].astype(str)


#Merge the population and GHG dataframes and create a per-capita variable - multiply by 1000 to simplify units 
regionPerCap_long = regionPop_long.merge(regionGHG_long, on=['Year','Province'])
regionPerCap_long['ghg_percapita'] = regionPerCap_long['GHG']/regionPerCap_long['Population'] *1000

#Import shapefile of provincial/territorial geometries
#Format the dataframe to mesh with the per-capita dataframe - specify function to remove unwanted characters from province column
import geopandas as gpd
geo_df = gpd.read_file("C:/Users/hurdg/OneDrive/Documents/MDSA/DATA 601/Data 601 - Project/Datasets/georef-canada-province-millesime.shp")     
geo_df.rename(columns={'prov_name_e':'Province'}, inplace=True)
geo_df = geo_df.loc[:, ['Province', 'geometry']]

def remove_unwanted(x):
    x = x.replace("]", "")
    x = x.replace("[", "")
    return x.translate(str.maketrans({"'":None}))


geo_df['Province'] = geo_df['Province'].apply(remove_unwanted)

#Merge per-capita dataframe with geometries
geo_df = geo_df.merge(regionPerCap_long, on = 'Province')

#Create json using 2010 data (arbitrary) to act as geospatial reference for choropleth
gdf_2010 = geo_df[geo_df['Year'] == 2010]
geo_df['Year'] = geo_df['Year'].astype(str)
json_2010 = gdf_2010.to_json()
import json
json_2010 = json.loads(json_2010)


#Create choropleth within plotly
fig = px.choropleth(
    geo_df,
    locations="Province",
    featureidkey="properties.Province", ### ! changed !
    geojson=json_2010,
    color=geo_df.ghg_percapita.astype(float),
    hover_name="Province",
    range_color=(0, 100),
    color_continuous_scale='YlOrBr',
    animation_frame=geo_df.Year.astype(str)
)

#Format slider bar label and font
sliders = [dict(
currentvalue={"prefix": "<b>Year </b>"},
font=dict(size=20,color="Black"))]
fig.update_layout(sliders=sliders)

#Specify figure title
fig.update_layout(
    title_text = 'Figure 5b.Per Capita GHG Emissions',
    #Specify colorbar to appear on left side of map
    coloraxis_colorbar_x=-0.1)

#Snap initial figure frame around map
fig.update_geos(fitbounds="locations", visible=False)

fig.show()

Findings and Analysis of Guiding Question 2¶

The visualizations from both line plots (figure 1b) and pie charts (figure 2b) indicated that Alberta is the largest contributor to GHG emissions, followed by Ontario, Quebec, Saskatchewan, British Columbia, Manitoba, Nova Scotia, New Brunswick, Newfoundland and Labrador, PE Island, Northwest Territories, Nunavut, and Yukon. Alberta alone accounts for approximately 38% or more than a third of all GHG emissions, with Ontario as the next significant contributor, making up 22.6%, which is just under a quarter of the total emissions. However, this method doesn't account for variations in regional population sizes.

When considering per capita GHG emissions, as demonstrated in figure 3b for the year 2019, it becomes clear that the top three per capita emitters, in order, are Alberta, Saskatchewan, and the Northwest Territories. This suggests that individuals in these regions have a relatively high environmental impact in terms of GHG emissions. Despite their smaller populations compared to provinces like Ontario or Quebec, residents in these regions are contributing more emissions per person. This underscores the need for tailored policies and strategies at both the provincial and individual levels to address this impact effectively.

The objective of the comparative visualization in figure 4b was to address the question of how various provinces and territories were managing their GHG emissions over time. The bubble graph proved to be an effective tool for this analysis, allowing us to not only observe trends in GHG emissions but also population growth (on the x-axis) and the size of bubbles representing GHG emissions from the energy sector. Our examination of this visualization revealed a common trend among all provinces: an increase in population between the years 2000, 2010, and 2019. However, there were distinct patterns in GHG emissions management. Notably, Ontario, Nova Scotia, New Brunswick, Prince Edward Island, and the Northwest Territories demonstrated a commendable reduction in emissions over time. A closer look at the data and consultation with relevant literature provided insights into the reasons behind these changes. For instance, Ontario's emissions reduction was primarily attributed to the closure of coal-fired electricity generation plants, as reported by Environment and Climate Change Canada (2023). Quebec also displayed a slight decreasing trend in emissions, which could be linked to reductions in the residential sector, as well as changes in aluminum production and petroleum refining industries. Conversely, British Columbia showed relatively stable emissions. The remaining provinces experienced emissions increases in tandem with population growth. Alberta and Saskatchewan, in particular, saw notable increases due to heightened activities in the oil and gas industries, as noted in the Environment and Climate Change Canada report (2023).

Given the overall goal of reducing GHG emissions, the total amount of emissions particular to a province/territory can be taken to represent the severity of the issue in that region. However, the severity does not necessarily correlate directly with potential for improvement. In many ways, population and GHG emissions are directly linked. Efforts aimed at reducing GHG emissions are likely best focused on the areas with the most potential for improvement, which are not necessarily the areas where the problem is most severe. Figure 5b was created to highlight htis concept. Whereas Ontario was one of the highest GHG emitting provinces on an absolute level, suggesting a sever issue, it's per capita output is relatively modest. On the other hand, Alberta and Saskatchewan were identified amonth the highest total GHG emitters, and remained high at the per capita level. This suggests that the GHG emissions of the regions are severe, but that there may be a relatively high potential for improvement. Given this framing, it is also important to remember the other contribtuting factors associated with GHG emissions. In the same way that population are GHG emissions are somewhat inseperable, so too is economic activity. As stated above, Alberta contributes substantially to the oil and gas industries, which may have a substantial impact on the hypothesized potential for reduction in GHG emissions. Additional studies into the GHG emission profile of Alberta, and also Saskatchewan, will be essential for the further identification of potential areas for GHG emission reductions in Canada.

Conclusion¶

In summary, our analysis of GHG emissions in Canadian provinces and territories reveals a clear hierarchy of contributors, with Alberta leading the way, followed by Ontario, Quebec, Saskatchewan, and other regions. Further resolution to these spatial trends was provided by per capita modelling, which revealed a consistently high GHG output form Alberta, and to a lesser extent, Saskatchewan. These findings underscore the significance of tailored policies, especially when considering per capita emissions, where Alberta and Saskatchewan stand out as high-impact areas. A closer examination of GHG emissions management over time highlights positive trends in several provinces, such as emissions reduction in Ontario, Nova Scotia, New Brunswick, Prince Edward Island, and the Northwest Territories. These reductions are often tied to specific actions, like the closure of coal-fired plants in Ontario and changes in the industrial landscape in Quebec. Conversely, Alberta and Saskatchewan experience emissions increases, largely driven by activities in the oil and gas sectors. This comprehensive analysis calls for region-specific strategies to address GHG emissions effectively. We acquired valuable skills through this process, including proficient data wrangling techniques, the capability to efficiently iterate through multiple datasets simultaneously, and the ability to construct informative dashboards. As for the economic impact of GHG emissions, there has been a steady increase in GHG emissions for the industrial, freight and passenger transportation sectors, in contrast, there has been a marginal decrease in the residential and commercial sectors over the same period. Our analysis shows that despite the efforts by the government to reduce GHG emissions across Canada, there continues to be an upward trend from the industrial and transportation sectors, this is also evident with the continued rise in GDP for most of the period.

As we dived deeper into the topic of GHG emissions, we acquired valuable skills, including proficient data wrangling techniques, the capability to efficiently iterate through multiple datasets simultaneously, and the ability to construct informative dashboards.

Reference List¶

  1. Boyce, S. (2023). Political Governance, Socioeconomics, and Weather Influence Greenhouse Gas Emissions across Subnational Jurisdictions in Canada (Doctoral thesis; University of Alberta)

  2. Natural Resources Canada - Office of Energy Efficiency - Demand Policy and Analysis Division. (2023, February 28). Commodity prices and background indicators. Government of Canada, Natural Resources Canada. https://oee.nrcan.gc.ca/corporate/statistics/neud/dpa/showTable.cfm?type=HB&sector=aaa&juris=ca&year=2020&rn=5&page=0

  3. Natural Resources Canada - Office of Energy Efficiency - Demand Policy and Analysis Division. (2023, March 14). Canada’s GHG emissions by Sector, end use and subsector – including electricity-related emissions. Government of Canada, Natural Resources Canada. https://oee.nrcan.gc.ca/corporate/statistics/neud/dpa/showTable.cfm?type=HB&sector=aaa&juris=ca&rn=3&page=0

  4. Slav, I. (2022, August 11). Alberta oil output hits record high. OilPrice.com. https://oilprice.com/Latest-Energy-News/World-News/Alberta-Oil-Output-Hits-Record-High.html

  5. Canada, N. R. (2023). Natural Resources Canada. Natural Resources Canada - GHG Emissions by Sector, End Use and Subsector. https://oee.nrcan.gc.ca/corporate/statistics/neud/dpa/showTable.cfm?type=HB&sector=aaa&juris=ca&rn=3&page=0

  6. Environment and Climate Change Canada (2023). Canadian Environmental Sustainability Indicators: Global greenhouse gas emissions. Consulted on Sept 23, 2023. www.canada.ca/en/environment-climate-change/services/environmental-indicators/globalgreenhouse-gas-emissions.html.

  7. Environment and Climate Change Canada (2023a).Water governance and legislation: provincial and territorial. Retrieved on Oct 10, 2023 from https://www.canada.ca/en/environment-climate-change/services/water-overview/governance-legislation/provincial-territorial.html

  8. Government of Canada, C. E. R. (2023). Canada energy regulator / Régie de l’énergie du Canada. CER. https://www.cer-rec.gc.ca/en/data-analysis/energy-markets/provincial-territorial-energy-profiles/provincial-territorial-energy-profiles-explore.html

  9. Helbling, T. (2013). On the Rise: High prices and new technology have triggered a sudden surge in oil and gas production in the United States that could shake up global energy markets. Finance & Development, 50(1), ISBN: 9781475542684. International Monetary Fund. DOI: https://doi.org/10.5089/9781475542684.022

  10. McKinney,W.(2010). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python In Science Conference 445: 51-56.https://conference.scipy.org/proceedings/scipy2010/pdfs/mckinney.pdf

  11. Natural Resources Canada - Office of Energy Efficiency - Demand Policy and Analysis Division. (2023). Commodity prices and background indicators. Natural Resources Canada https://oee.nrcan.gc.ca/corporate/statistics/neud/dpa/showTable.cfm?type=HB&sector=aaa&juris=ca&year=2020&rn=5&page=0

  12. Osler. (2023.). Canadian Climate Change Policy Developments. Retrieved from https://www.osler.com/en/resources/regulations/focus/canadian-climate-change-policy-developments

  13. Data Network Team. (n.d.). Provinces and territories - Canada, (2023), Opendatasoft. Retrieved October 3, 2023 from https://data.opendatasoft.com/explore/dataset/georef-canada-province%40public/information/ ?disjunctive.prov_name_en&location=3,63.54855,-70.22461&basemap=jawg.streets

  14. Statistics Canada. Table 17-10-0009-01 Population estimates, quarterly. Consulted on Sept 23, 2023. DOI: https://doi.org/10.25318/1710000901-eng

  15. United Nations, Department of Economic and Social Affairs, Population Division (2022). World Population Prospects: The 2022 Revision, custom data acquired via website

  16. Plot Types — Matplotlib 3.8.0 Documentation. https://matplotlib.org/stable/plot_types/index.html. Accessed 9 Oct. 2023.

  17. Wasom, M. L., (2021). Seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021, https://doi.org/10.21105/joss.03021

  18. Plotly Technologies Inc (2015). Collaborative data science. https://plot.ly.

  19. Wlodkowski, P. Making Interactive Choropleth Maps with Temperature Data, (2021), GitHub repository. Retrieved October 1, 2023 from https://github.com/pawlodkowski/interactive_climate_map

  20. Vu, T. (2022, April 24). Thu-VU92/python-dashboard-panel: Interactive Visualization Dashboard in python with panel. GitHub. https://github.com/thu-vu92/python-dashboard-panel

  21. Vu, T. (2022, March 9). How to create a beautiful python visualization dashboard with panel/hvplot. YouTube. https://www.youtube.com/watch?v=uhxiXOTKzfs&ab_channel=ThuVudataanalytics

  22. nkmk. (2023, August 8). Pandas: Transpose DataFrame (swap rows and columns). pandas: Transpose DataFrame (swap rows and columns). https://note.nkmk.me/en/python-pandas-t-transpose/

  23. Holoviz Contributors. (n.d.-a). Fast List Template. FastListTemplate - Panel v1.2.3. https://panel.holoviz.org/reference/templates/FastListTemplate.html

  24. Stratis, K., Santos, A., Weber, B., Hjelle, G., Jablonski, J., Schmitt, J., Finegan, K., & Breuss, M. (2023, February 7). Combining data in pandas with merge(), .join(), and CONCAT(). Real Python. https://realpython.com/pandas-merge-join-and-concat/

  25. Kelsey Jordahl, Joris Van den Bossche, Martin Fleischmann, Jacob Wasserman, James McBride, Jeffrey Gerard, … François Leblanc. (2020, July 15). geopandas/geopandas: v0.8.1 (Version v0.8.1). Zenodo. http://doi.org/10.5281/zenodo.3946761

  26. Environment and Climate Change Canada (2023). Canadian Environmental Sustainability Indicators: Greenhouse Gas Emissions. Consulted on October 15, 2023. Available at https://www.canada.ca/en/environment-climate-change/services/environmental-indicators/greenhouse-gas-emissions.html

In [ ]: